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Abstract 

One way to build a reactive system is to construct an action table 
indexed by the current situation or stimulus. The action table de- 
scribes what course of action to pursue for each situation or stim- 
ulus. This paper describes an incremental approach to construct- 
ing the action table through achieving goals with a hierarchical 
search system. These hierarchies are generated with transforma- 
tions called concretization * , which add constraints to a problem 
and which can reduce the search space. The basic idea is that an 
action for a state is looked up in the action table and executed 
whenever the action table has an entry for that state; otherwise, 
a path is found to the nearest (cost- wise in a graph with cost- 
weighted arcs) state that has a mappring from a state in the next 
highest hierarchy. For each state along the solution path, the suc- 
cessor state in the path is cached in the action table entry for that 
state. Without caching, the hierarchical search system can loga- 
rithmically reduce search. When the table is complete the system 
no longer searches: it simply reacts by proceeding to the state 
listed in the table for each state. Since the cached information 
is specific only to the nearest state in the next highest hierarchy 
and not the goal, inter-goal transfer of reactivity is possible. To 
illustrate our approach, we show how an implemented hierarchical 
search system can completely reactive. 

1 Introduction and Motivation 

Intelligent interaction with the world can be viewed as 
a combination of planning to achieve some goal and of 
reaction to external stimuli in the course of executing 
a plan. A pure planning system produces a c -olete 
plan of actions before executing it [4, 8, 3, ;]. In 
contrast, a pure reactive system quickly selects and ex- 
ecutes a single action based on an external stimulus [2, 
1, 6]. Planning systems appear to work well when the 
predictability of the world is precisely captured in the 
planner’s actions, whereas reactive systems appear to 
work well in worlds that are fraught with uncertainty 
or unpredictability — where plans have little chance of 
succeeding in their entirety, where the ability to plan to 
completion is not a virtue. This paper describes how a 
planning system can incrementally become more reac- 
tive through interaction with its world, By becoming 
more reactive, the system reduces its decision-making 
time. 

Previous approaches to building reactive systems 
from non-reactive ones include compilation and learn- 
ing from examples. Firby [5] and Rosenschein [13] 
show how to compile high-level input descriptions of 


actions and goals into reactive systems. Similarly, 
Rosenschein and Kaelbling describe a technique to 
compile constraint expressions into directly executable 
circuits for a robotic control system [14]. Mitchell 
uses Explanation-Based Learning to incrementally 
learn the general conditions under which a particu- 
lar action, which helps achieve a particular planning 
goal, should be applied [10]. If the conditions are 
matched, the same action is applied — irrespective of 
the system’s current goal. The advantage of learning 
over compiling is that examples focus on those parts 
of the environment with which an intelligent agent ac- 
tually interacts; only those actions that are relevant to 
that interaction are compiled for reactivity. 

The problem with the Explanation-Based Learning 
approach is that multiple goals can lead to multiple 
action suggestions for the same state, which results in 
deliberation as to which action to apply and therefore 
less reactivity. This anomaly is commonly called 
the wandering bottleneck problem in the machine 
learning literature; as a result of eliminating one time 
bottleneck (e.g. time taken to react) another one 
unexpectedly arises (e.g. time taken to decide how to 
react). More precisely, in a problem with n problem- 
solving states, each state can have as many as n 
possible action suggestions since there can be as many 
as n goals from which the action suggestions arc 
learned. Moreover, to store such a network of states 
and actions can require as much as 0(n 2 log n ) space 
over n goals and n states, since O(logn) space is 
required to store each action suggestion. If n is an 
exponential of problem size, then this approach is 
generally not feasible. 

This paper describes a technique to avoid the wan- 
dering bottleneck problem by hierarchically organiz- 
ing the state space such that at most one action is 
learned for each state. As a side-benefit, this hierar- 
chical organization reduces the worst-case space r 
quirements by a factor of n. 

The rest of this paper is organized as follows. Sec- 
tion 2 defines the notion of a concretization and de- 
rives several important properties of concretizations in 
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search. Section 3 describes our approach to becoming 
reactive by concretization. Section 4 presents experi- 
mental results of applying our approach. Finally, Sec- 
tion 5 summarizes the conclusions of this work and 
discusses a few promising avenues of future research. 

2 Concretizations 

Intuitively, a concretization of a problem is one that 
has added constraints. The importance of these added 
constraints is that they reduce the branching factor dur- 
ing search. To formalize this intuitive notion requires 
a definition of search. The definition that we will as- 
sume is standard in the AI literature [11]. A search 
problem can be thought of as consisting of a graph of 
nodes, which represent states, and directed arcs that 
represent the application of an operator. These arcs 
are typically weighted to represent the cost of apply- 
ing the corresponding operator. Search can be thought 
of as finding a finite path in the graph from a node rep- 
resenting a given initial state to a node representing a 
given goal state. The graph can be specified explicitly 
or implicitly. In an explicit specification, the nodes 
and arcs with associated costs might be supplied in 
a table that includes every node in the graph and a 
list of its successors and the costs of associated arcs. 
This information might also be specified by a matrix 
that stores the costs of associated arcs for every pair 
of nodes (an infinite cost arc represents the absence of 
an arc). In an implicit specification, only that portion 
of the graph that is sufficient to include a goal node 
is made explicit by applying operators using a search 
algorithm such as A" [11]. For example, in the Eight 
Puzzle problem, the set of states consists of all tile 
permutations and operators only allow swapping the 
blank with an adjacent tile (i.e. the cost function on a 
pair of states returns 1 if one state is reachable from 
the other by swapping the blank with an adjacent tile, 
and oo otherwise). The goal state might specify that 
the dies are in a particular order. 

More formally, let a search problem be a 3-tuple 
(5, c), where 5 is a set of states describing situations 
of the world; c : 5 x 5 -» is a positive cost function 
that represents the cost of applying the corresponding 
action from one state to another, and G C S is a set of 
goal states. An instance of a search problem includes 
a 2-tuple (i,g) where i € 5 is the initial state and 
g 6 5 is the goal state (for simplicity, we assume that 
there is only one goal state). The objective is to find 
a finite length finite cost path from i to g. 


A problem { S',c ') is a concretization of another 
problem (S,c) with respect to <f> : S' -* S iff <t> 
reduces cost: (Vs 1 ,? 6 <p(t')) < c(s',t'). 

For example, Figure 1 shows a concretization of 
the Towers of Hanoi problem. The original problem 
is composed of operators that stack smaller disks on 
top of larger disks from pin to pin; states are simply 
disks stacked in increasing size on various pins. The 
initial and goal states for a typical three disk instance 
of the Towers of Hanoi problem are also shown in 
the figure. If the disks are numbered from top to bot- 
tom and then the operators are constrained such that 
they never place an odd-numbered disk on an even- 
numbered disk and vice versa, then this new problem 
is concretization of the original problem with respect 
to a mapping function that ignores disk parity. The 
reason is because the cost is reduced: operators apply 
more often in the original problem. Notice that any so- 
lution in the concrete space is guaranteed to be a solu- 
tion in the original space because the concretized prob- 
lem is more restricted. Since the branching factor will 
be lower for the concretized problem, solution gen- 
eration will be more efficient (though slightly longer 
solutions will generally result). This property, which 
we call solution-soundness , is perhaps most powerful 
when a problem can be concretized into one for which 
an efficient solution generator exists. Any solution to 
the concretized problem can then be directly mapped 
onto a solution to the original problem. For example, 
a Blocks World problem with three table locations can 
be concretized into a Towers of Hanoi problem, which 
has an associated divide-and-conquer algorithm, by as- 
signing a ‘‘size” to each block (say, small to large for 
each block on every stack, consistent in the initial and 
goal states). Any solution to the corresponding Tow- 
ers of Hanoi problem can be mapped onto a solution 
to the original problem simply by ignoring size. 

Tenenbeig describes a similar property, which he 
calls the downward solution property, in the context 
of planning with a certain type of operator representa- 
tion [16]. In his terminology, a transformed problem 
has the downward solution property if every solution 
in the transformed space can be mapped onto one for 
the original problem. Solution soundness is a gener- 
alization of the downward solution property since it 
does not depend on specific operator representations. 

Despite the solution-soundness property of con- 
cretizations, a solvable problem in the original space 
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Figure 4 Becoming Reactive Through Interaction with the World 


only those states that are most frequently encountered 
or apply learning techniques to reduce table size. In 
particular, we are currently investigating applying our 
ideas to a less artificial problem (a robot routing task), 
which includes explicitly specified operators with in- 
puts from external sensors such as in a robot routing 
task. It might be possible to apply Explanation-Based 
or inductive learning to leant the class of states hat 
lead to the nearest state in the next highest hierarchy. 

Another problem is that constructing concretization 
hierarchies is generally a difficult problem. However, 
a catalog of problem transformations such as those of 
Absolver II [12] might prove helpful. Another method 
might be to use clustering algorithms to group simi- 
lar states into equivalence classes. Problem-solving 
performance with more meaningful groupings — those 
that exploit the structure of the search graph and sim- 
ilarity of states — should be improved over the results 
we obtained with random hierarchical groupings. 

Ultimately, we would like to test our ideas in a 
dynamic world where an intelligent agent’s plans to 
achieve goals are continually thwarted by unforeseen 
events to which the agent has to react immediately, 
recover, and then proceed towards achieving the goal. 
We believe that a hierarchical learning system of the 
sort described here may be especially suited for such 
worlds. We are currently modeling a dynamic world 


and testing this hypothesis. 
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